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SUMMARY 


Plastid symbioses between heterotrophic hosts and algae are widespread and abundant in surface oceans. 
They are critically important both for extant ecological systems and for understanding the evolution of plas- 
tids. Kleptoplastidy, where the plastids of prey are temporarily retained and continuously re-acquired, pro- 
vides opportunities to study the transitional states of plastid establishment. Here, we investigated the poorly 
studied marine centrohelid Meringosphaera and its previously unidentified symbionts using culture-indepen- 
dent methods from environmental samples. Investigations of the 18S rDNA from single-cell assembled ge- 
nomes (SAGs) revealed uncharacterized genetic diversity within Meringosphaera that likely represents mul- 
tiple species. We found that Meringosphaera harbors plastids of Dictyochophyceae origin (stramenopiles), 
for which we recovered six full plastid genomes and found evidence of two distinct subgroups that are 
congruent with host identity. Environmental monitoring by qPCR and catalyzed reporter deposition-fluores- 
cence in situ hybridization (CARD-FISH) revealed seasonal dynamics of both host and plastid. In particular, 
we did not detect the plastids for 6 months of the year, which, combined with the lack of plastids in some 
SAGs, suggests that the plastids are temporary and the relationship is kleptoplastidic. Importantly, we found 
evidence of genetic integration of the kleptoplasts as we identified host-encoded plastid-associated genes, 
with evolutionary origins likely from the plastid source as well as from other alga sources. This is only the sec- 
ond case where host-encoded kleptoplast-targeted genes have been predicted in an ancestrally plastid- 
lacking group. Our results provide evidence for gene transfers and protein re-targeting as relatively early 
events in the evolution of plastid symbioses. 


INTRODUCTION recognized as a common interaction in aquatic ecosystems, 


occurring in a broad range of host species, including a few multi- 


The evolution of plastids involved gains, losses, and replace- 
ments, spreading photosynthetic capabilities to most eukaryotic 
supergroups and creating a complicated pattern across the tree 
of life.’ Eukaryotes first acquired photosynthesis in an ancestor 
of Archaeplastida through endosymbiosis with cyanobacteria, es- 
tablishing the primary plastids.” Subsequent endosymbioses 
between eukaryotes have spread these plastids into new lineages 
on multiple independent occasions.”*° In addition to the acquisi- 
tion of these permanent organelles, plastids can be acquired 
temporarily either by symbiotic interactions with microalgal 
whole cells (photosymbiosis) or by retaining plastids from prey 
species by kleptoplastidy, which enables the host to transiently 
acquire photosynthetic capabilities. Kleptoplastidy is increasingly 


cellular eukaryotes such as the saccoglossan sea slugs,”® 
some marine flatworms,’ and a diverse list of protist hosts 
(including foraminiferans, '° ciliates,'''* and dinoflagellates'*''“). 
Kleptoplastidic interactions can be temporarily stable and thus 
offer unique insights into the establishment of plastids. '° 
Kleptoplastidy is a temporary association because the host 
cells cannot maintain stolen plastids indefinitely. In permanent 
plastids, a vast amount of protein import from host-encoded 
genes is required to compensate for their highly reduced plastid 
genomes.'° The imported proteins have mixed origins: some 
are the products of endosymbiotic gene transfer (EGT), others 
co-opted host proteins, and still others originate from horizontal 
gene transfers (NGTs).'”'® In the absence of this suite of 
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imported proteins, kleptoplasts eventually degrade, and so the 
host must re-acquire plastids. '° 

Although some host species do little to counter the degradation 
of their kleptoplasts, other species actively support their stolen or- 
ganelles and, in some instances, dramatically extend their sur- 
vival. For example, the Antarctic Ross Sea dinoflagellate (RSD) 
can maintain its haptophyte-derived kleptoplasts for at least 
30 months without replacement,”° probably by targeting host-en- 
coded proteins to the kleptoplasts and thereby providing some of 
the necessary proteins.”' Interestingly, the nuclear genes that are 
targeted to the kleptoplasts often originated from HGT and not 
EGT, which aligns with the growing evidence of the importance 
of HGT for symbiotic integration.®?®?S To date, the presence of 
kleptoplast-targeted proteins in the host nucleus has only been 
documented in three cases: in RSD, in another dinoflagellate, Di- 
nophysis acuminata, that has cryptophyte-derived kleptoplasts,”* 
and in the euglenozoan Rapaza viridis that has green alga-derived 
kleptoplasts.°° However, the survival of kleptoplasts can be 
extended with a variety of other strategies. For instance, saco- 
glossan sea slugs exhibit shielding strategies that protect their 
kleptoplasts from potentially damaging light intensity.2° °° Simi- 
larly, some benthic foraminifera have been hypothesized to use 
behavioral strategies to protect their kleptoplasts from high light.°° 
Alternatively, in karyoklepty, the nucleus of the prey is also re- 
tained, and it remains transcriptionally active and services the 
kleptoplasts by providing the necessary nuclear-encoded pro- 
teins. This was first documented in the ciliate Mesodinium rubrum 
(also called Myrionecta rubra),°' where the cryptophyte nuclei are 
retained for up to 30 days, and has since also been identified in the 
dinoflagellate Nusuttodinium aeruginosum.°* The diversity within 
host strategies used to control and extend kleptoplast survival re- 
mains an open question, as well as how many of these strategies 
involve host-encoded genes. 

The import of host-encoded proteins into plastids is typically 
considered a benchmark of permanent and stable plastids. 
The prediction of protein import in kleptoplastidic systems indi- 
cates that a degree of genetic integration can already be 
achieved in temporarily retained plastids. It has been proposed 
that kleptoplastidic interactions controlled by host-encoded 
genes could represent a “tipping point” to stable genetically in- 
tegrated plastids.'°?' Furthermore, the acquisition of foreign 
genes (e.g., HGT) by a kleptoplastidic host could provide pre-ad- 
aptations to future plastids. In this way, kleptoplastidy fits within 
the “shopping bag” model of plastid evolution,” in which the se- 
rial uptake of symbionts provides necessary pre-adaptations 
required for permanent plastid establishment. Under this model, 
protein import establishes prior to permanent plastids and acts 
as an evolutionary ratchet that enables plastid fixation.°* °° 

In this study, we investigated a little-known marine protist, 
Meringosphaera, and the autofluorescent “green bodies” 
consistently observed internally. Weringosphaera is a globally 
distributed genus with distinctive undulating silica spines,°”°° 
which can reach dominance in planktonic assemblages.?® 4° 
Obiol et al.*' reported that Meringosphaera (referred to as Cen- 
trohelida-sp1) was “the most widespread and abundant ASV” 
(amplicon sequence variant) during the Malaspina 2010 Circum- 
navigation Expedition. Despite this, Meringosphaera remains 
remarkably understudied. First characterized in 1902,*° it has 
historically been treated as a chrysophyte-related alga due to 
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the consistent presence of green bodies presumed to be chloro- 
plasts.°°*° Recently, the first Meringosphaera 18S rDNA 
sequence was obtained, which led to its re-classification as a 
centrohelid (Haptista: Centroplasthelida), in the environmental 
marine clade NC5.** Importantly, there are no permanent plas- 
tids known in centrohelids,*° though there are two suggested 
cases of kleptoplastidic freshwater species”? and endosymbi- 
otic Chlorella have been found in the centrohelid Acanthocystis 
turfacea.*’ Centrohelids are typically not considered to have 
been ancestrally photosynthetic.*® The origin of the green bodies 
in Meringosphaera is, therefore, unknown. 

Since Meringosphaera is uncultivated, we used a suite of 
culture-independent approaches on surface water collected 
monthly from the North Sea (West coast of Sweden) to charac- 
terize the fluorescent bodies observed in the cells. Following 
manual isolation of Meringosphaera cells, we generated single- 
cell amplified genomes (SAGs) to study the genetic identity of 
the plastids and to search for evidence of genetic integration via 
host-encoded plastid-associated genes. Specific quantitative 
PCR (qPCR) assays were designed for environmental monitoring 
of Meringosphaera host and plastids over 14 months of sampling. 
Catalyzed reporter deposition-fluorescence in situ hybridization 
(CARD-FISH) was used with confocal microscopy to confirm 
the 18S rRNA of the host Meringosphaera and to visualize the pres- 
ence/absence of plastids. Overall, the results suggest that Merin- 
gosphaera exhibits selective kleptoplastidy with evidence of host- 
encoded plastid-associated genes. The exact identity of the 
kleptoplasts depends on the season and the host lineage. Except 
for the euglenozoan Rapaza viridis, this is the only other known 
case of genetic integration in a kleptoplastidic host outside of dino- 
flagellates. This is particularly significant because centrohelids, 
unlike dinoflagellates but like Rapaza viridis, are not ancestrally 
plastid-bearing, and thus these host-encoded genes arose in a nu- 
clear genome that had not previously co-evolved with a plastid. 


RESULTS 


Origin and diversity of Meringosphaera plastid 
association 

We used 15 manually isolated Meringosphaera cells, collected in 
October and November 2018, to perform individual genome 
amplification and assemble SAGs. Using the previously reported 
Meringosphaera 18S rDNA sequence as query (GenBank: 
MZ240752""), we found by BLASTn a highly similar 18S rDNA 
sequence (95%-99% identity) in every SAG. Using two centro- 
helid 28S rDNA sequences as queries (GenBank: JQ245080 
and AY752993), we further identified a 28S rDNA sequence in 
every SAG. When placed in a concatenated 18S and 28S rDNA 
phylogeny alongside representative centrohelid sequences (Fig- 
ure 1A), the Meringosphaera sequences and a few environmental 
sequences formed a well-supported monophyletic clade (boot- 
strap percentage[BP] = 95%). The SAGs were split into two 
groups (“group 1” and ”group 2,” BP = 100% and 92%, respec- 
tively), representing previously unknown genetic diversity. 

To assess whether the green bodies commonly observed 
within Meringosphaera correspond to full endosymbiont cells, 
we searched for any additional 18S rDNA sequence in the 
SAGs using a diverse set of reference sequences (see STAR 
Methods). In only two SAGs did we identify a second 18S 


Current Biology © CelPress 


Article OPEN ACCESS 


A Raphidocystis 
72681997 Acanthocystidae 


NC6 


Š ol Raphidiophryidae 


Ozanamiidae 
ee | Pterocystidae 
65]! — ¢1217 Otu0275 Uncultured Eukaryote 
red c-1123 Otu0623 Centrohelid NC5 
c-4966 Otu0644 Centrohelid NCS 

c-17159 Otu0640 Centrohelid NC5 
c-2136 Otu0760 Centrohelid NC6 
c-2087 Otu0599 Centrohelid NCS 
161810080 Otu0705 Centrohelid NC5 
c-1824 Otu0301 Centrohelid NC5 
c-10318 Otu0376 Centrohelid NC5 
Meringosphaera SAG S10 
Meringosphaera SAG S13 
Meringosphaera SAG $1 
Meringosphaera SAG S7 
Meringosphaera SAG S4 
l Meringosphaera SAG S5 

c-4112 Otu0408 Centrohelid NC5 
c-1960 Otu0162 Centrohelid NC5 
c-331 Otu108 Centrohelid NCS 
Meringosphaera SAG S9 
Meringosphaera SAG S8 
Meringosphaera SAG S11 
Meringosphaera SAG $12 
Meringosphaera SAG S15 
Meringosphaera SAG S14 
Meringosphaera SAG $3 
Meringosphaera SAG $2 
Meringosphaera SAG S6 


ce ~~~ Haptophyta outgroup 


0.08 


B [rar Dictyocha speculum 


79) 


75| MK561359 Dictyocha speculum 
FJ826365 Uncultured bacterium 
— JX015817 Uncultured bacterium 
KX937734 Uncultured eukaryote 
0Q091774 Meringosp 
0Q091775 Meringo 
0Q091776 Meringos 
0Q091777 Merin: 
0Q091780 Merin! 
KX937733 Uncultt 


$ 


82 


s JF901752 Uncultured diatom 
JN457994 Uncultured orgar 


=] KX937727 Uncultured eu 
KX937728 Uncultured eul 
MK518382 Florenciella parvula 
JQ515093 Uncultured d 
JQ199761 Uncultured 
0Q091778 Meringosphaera SA! 
0Q091781 Meringosphaera SAG 
0Q091779 Meringosphaera SAG 


NC012898 Aureococcus anophagefferens 
e | „Pelagophyceae outgroup 
—ar NC012903 Aureoumbra lagunensis 


0.02 


Figure 1. Diversity and phylogeny of the Meringosphaera host and its plastid from the Swedish west coast 
(A) Maximum likelihood tree of selected representative concatenated 18S and 28S rDNA sequences showing Meringosphaera within the Centroplasthelida. The 
tree was reconstructed with a GTR model, with 4 gamma categories, and support values correspond to rapid bootstrap with 1,000 replicates. The two groups are 
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rDNA sequence (S5 and S7, both with 100% sequence similarity 
with the green algae Nannochloris sp.); however, given that it 
was rare, inconsistent across the samples, and mismatched to 
the plastid identity (see below), we considered it unlikely to 
correspond to the observed fluorescent bodies. In the remaining 
13 SAGs, apart from the Meringosphaera 18S rDNA, no other 
18S rDNA sequences were found. This suggests the absence 
of another eukaryotic nucleus (from, e.g., prey or photosym- 
bionts), and thus that the fluorescent bodies likely correspond 
to plastids. 

The identity of the plastids was determined by extracting 16S 
rDNA sequences from the SAGs. Despite that all picked cells ap- 
peared to contain green fluorescent bodies, we only found 16S 
rDNA plastid sequences in eight of the 15 SAGs. A 16S rDNA 
phylogeny showed that all eight plastid sequences are closely 
related to plastids from Dictyochophyceae and a few environ- 
mental sequences but were clearly distinct from the few 
described species (Figure 1B). The plastid phylogeny based on 
maximum likelihood showed a similar pattern to the host 18S 
rDNA phylogeny, with two distinct and well-supported clades 
(group 1 plastids BP = 97%, group 2 plastids = 100 %). In order 
to confirm the pattern given by the 16S rDNA plastid phylogeny, 
we also constructed phylogenies for two other plastid molecular 
markers: the large subunit of ribulose 1,5-bisphosphate carbox- 
ylase/oxygenase (rbcL) and the D1 protein of photosystem II 
(psbA). Both genes were only recovered in a subset of the 
SAGs (but fully overlapping with the eight SAGs with plastid 
16S rDNA sequences): rbcL in 11 SAGs and psbA in nine 
SAGs (summarized in Table S1). Congruent with the 16S rDNA 
phylogeny, the rbcL and psbA sequences from the Meringos- 
phaera plastids were most closely related to Dictyochophyceae 
and partitioned into two distinct groups (Figure S1), providing 
additional evidence for the plastid origin and that distinct plastids 
are found in the two major host groups. Interestingly, for all three 
of the plastid genes, the branching pattern is congruent with the 
18S rDNA phylogenies, i.e., the group 1 hosts have one type of 
plastid, whereas the group 2 hosts have the other. 


Seasonal dynamics of Meringosphaera and its plastid 

To investigate the seasonal dynamics of Meringosphaera and its 
plastids, we surveyed over a year of samples and three locations 
with targeted qPCR assays that distinguish between group 1 and 
group 2 for both the host Meringosphaera 18S rDNA and the 
plastid-encoded rbcL genes. The assay was applied to whole 
water samples, meaning that the plastid rbcL detection could 
originate from plastids either inside Meringosphaera or inside 
free-living Dictyochophyceae. The qPCR assay was designed 
based on the SAG data from October to November 2018, and 
as such the oligonucleotides could be biased toward the groups 
found during that time of year. The 18S rDNA results (Figure 2A) 
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show that Meringosphaera group 2 was present throughout the 
year (mean ~ 1.1 x 10° gene copies L~'), but gene abundance 
varied throughout the months (ANOVA, Fii45 = 8.53, 
p < 0.001). The highest gene copies (~1.5 x 10’ gene copies 
L~') were detected in April, in what could be a spring bloom. 
Otherwise, in general, the gene abundances were higher 
(~1.3 x 10° gene copies L~') in late summer after which they 
decreased through autumn down to very low frequencies in 
winter (range of 3-374 gene copies L~'). By contrast, the Merin- 
gosphaera group 1 host was only detected in October and 
November of 2021. The presence of group 1 host in 2021 and 
not 2020 could have biological reasons but may also be ex- 
plained by sampling strategy given that in 2020 only one of the 
three sites was sampled. 

The seasonal dynamics of the plastids only partially corre- 
sponded to that of the Meringosphaera host (Figure 2A). Neither 
plastid groups were detected from January to June, and in July, 
August, and September only the group 2 plastid was detected. 
The identified plastids from the SAGs were, therefore, only de- 
tected in the second half of the year. In the first half of the 
year, the group 2 Meringosphaera was either aposymbiotic or 
switched to a different plastid that was not detected by our as- 
says. Either scenario suggests that this is kleptoplastidy and 
not a permanent plastid. By contrast, the group 1 plastid dis- 
played a similar temporal dynamic with the group 1 host as 
both were only detected between October and December. How- 
ever, group 1 hosts and plastids showed spatial separation as 
they were detected in different sampling locations that differ in 
their hydrological conditions (Figures S2 and S3). The data sug- 
gest that the geographical distribution of the group 1 host and 
plastid is complicated, with potential migrations into our sam- 
pling locations for only 2-3 months of the year. Unfortunately, 
we do not have sufficient sampling data to form a complete pic- 
ture of the changing distribution of group 1—future work will 
need to address this. Interestingly, qPCR data for 2 consecutive 
years in October and November (2020 and 2021) showed consis- 
tency in gene copy number and plastid detection (Figure 2A). The 
plastid identity at this time of year was also consistent with SAGs 
from earlier samples in October and November 2018. This level 
of consistency suggests that the fluctuations in plastid identity 
could be repeating annually, and therefore seasonal dynamics 
might play an important role in the Meringosphaera kleptoplas- 
tidy (although a longer time series is needed to confirm these 
preliminary observations). No pattern between basic environ- 
mental conditions and Meringosphaera host or plastid gene 
abundances was found (Figure S3). 


Visualization of the Meringosphaera plastids 
To test whether the group 2 Meringosphaera was aposymbiotic 
from January to June or had instead switched plastid, we used 


designated as group 1 and group 2, these two groups corresponded to clustering the Meringosphaera 18S rDNA sequences by >97% identity (the separate 18S 
and 28S phylogenies are available in the Figshare repository D2.1). The Meringosphaera sequences written in bold were generated by this project, and all other 
previous sequences are unidentified environmental sequences. Haptophyte sequences were used as an outgroup. 

(B) Maximum likelihood tree of selected cultured and representative 16S rDNA sequences showing the Meringosphaera plastids within the Dictyochophyceae 
plastids. See Figure S1 for the phylogenies of the additional plastid genes psbA and rbcL. The tree was reconstructed using a TIM3 + F + | + G4 model, and 
support values correspond to ultrafast bootstrap values from 1,000 replicates. The Meringosphaera plastid sequences written in bold were generated by this 
project. Pelagophyceae sequences were used as outgroup. In both trees, support values over 50% are shown on the trees, with values over 90% represented by 
a black circle on the branch. See STAR Methods for details regarding the tree formation. 
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Figure 2. Seasonal dynamics of Meringos- 
phaera and its plastids 

(A) Abundance of Meringosphaera group 1 and 2 
and respective plastids detected in the monthly 


a sampling from the Swedish west coast. The top 
me panel shows the number of 18S rDNA gene copies 
L~! for the Meringosphaera group 1 and group 2 
Group K 
KO coe and the lower panel shows the respective number 
AG group 1 of rbcL gene copies L ` for the plastid group 1 and 
group 2. The group identity of the hosts and 
Additional Months plastids is indicated by the color of the points. The 
A 2020 abundance is reported as the Log of the gene 
copies L~'. The samples were taken monthly be- 
v tween March 2021 and February 2022, in addition 
B two extra samples were taken in October and 
a 


November 2020, and these are indicated by tri- 
angle-shaped points. There is variation in the 
number of locations sampled, which is why the 
number of points varies between the months (see 
Figure S2 and STAR Methods for details). If the 
target was detected in some but not all of the 
technical replicates, the sample was marked as 
detected but not quantifiable and was plotted with 
a value of 1.1. Data are represented as the mean + 
SEM. See also Figure S3 for the hydrographic 
conditions at the sampling stations. 

(B-E) Images of Meringosphaera and its klepto- 
plasts collected in April 2021 from the A17 sam- 
pling station and visualized by CARD-FISH, 
showing that the plastids are numerous and intact 
despite the lack of detection in the qPCR assay. 
The cells were imaged on a confocal microscope 
with the 405, 488, and 633 nm lasers. In all panels, 


the green signal is from the Alexa 488 dye bound to the Mer482 probe that was designed to specifically target Meringosphaera 18S rRNA, the blue signal is from 
the nucleic acid stain DAPI, and the red signal is from chlorophyll a autofluorescence. (B)-(E) are images taken from the same cell, (B)-(D) show each channel 
individually, and (E) shows the overlay of the three channels. Scale bars, 2 um. Video S1 is composed of the z stack images taken from this same cell. 


CARD-FISH and confocal microscopy to visualize individual 
Meringosphaera cells. We designed a specific CARD-FISH 
probe for identifying the Meringosphaera host, which in combi- 
nation with chlorophyll autofluorescence enabled the visualiza- 
tion of potential photosynthetic plastids. We used samples 
from the spring bloom of April 2021, a time when our qPCR assay 
detected high copies of the group 2 Meringosphaera host (e.g., 
107-10” gene copies L~") but both plastid groups were below 
detection. 

The CARD-FISH probe hybridized only to appropriately sized 
cells with the expected morphology and cell diameters (4-9 um 
diameter spherical cells) indicating a successful CARD-FISH 
procedure. We did not detect the characteristic undulating 
spines of Meringosphaera, but this was expected given the 
acidic steps in the CARD-FISH procedure (see STAR Methods). 
The CARD-FISH images (Figures 2B-2E) clearly show the pres- 
ence of multiple and intact plastids within a positively hybrid- 
ized Meringosphaera cell. The chlorophyll a autofluorescence 
and 18S rRNA probe occupied distinct regions within the Mer- 
ingosphaera cell, revealing good intracellular specificity 
(Figures 2B and 2C). The plastids appear located to the edge 
of the cell, in a very similar arrangement to previous observa- 
tions of Meringosphaera cells with plastids (e.g., Figures 1A 
and 1B in Zlatogursky et al.””). The z stack images (Video S1) 
confirmed that the plastids were positioned within the cell 
and not located either above or below. DAPI only stained the 


Meringosphaera nucleus (Figure 2D), but no potential nuclei 
associated with the plastids, supporting the SAG analysis that 
did not find a second nucleus. The clear presence of plastids 
within some Meringosphaera cells in April 2021 but a lack of 
detection by our specific qPCR assay suggests that Meringos- 
phaera might be able to switch plastids. 


Variation in plastid integrity and gene repertoire 

Plastid contigs were recovered from a total of 11 SAGs, with six 
mapping into circular genomes. The completeness of the plastid 
genomes varied for the other SAGs: five possessed incomplete 
plastid genomes, four of which were fragmented across multiple 
contigs, and four SAGs lacked plastid sequences (Figure 3; 
Table S1). The six complete plastid genomes, recovered from 
both Meringosphaera group 1 and 2, were annotated and 
compared with the closest available Dictyochophyceae with 
reference plastid genomes (all free-living). The SAG plastid 
genome sizes varied according to group identity (Figure 3; 
Table S1). The group 1 genomes were slightly smaller with a 
mean size of 83,326 bp, whereas the group 2 genomes had a 
mean of 89,487 bp, but the number of predicted open reading 
frames (ORFs) were similar between the two groups varying be- 
tween 137 and 141. Both the genome size and predicted gene 
number are lower than those of the reference Dictyochophyceae 
plastid genomes, which have plastid genomes ranging from 
108,152 to 140,025 bp with 144-159 predicted genes.*° 
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Figure 3. Variation in the completeness of the plastid genomes from group 1 and group 2 Meringosphaera SAGs 
The plastid genome maps are shown for genomes predicted to be complete and circular. For the incomplete genomes, the number of fragments and size of each 
fragment is written per SAG. The function of the predicted ORFs in the complete genomes are indicated by their colour as described in the legend. 


The functional repertoire of the plastids found in Meringos- 
phaera shows that they retain the capacity to perform photosyn- 
thesis (Figures 4 and S4). In particular, they retain the core com- 
ponents of both photosystem | and II (psa- and psb- genes), 
carbon fixation (rbcL and rbcS), cytochrome b6/f complex 
(pet- genes), ATP synthase (atp- genes), and chlorophyll 
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biosynthesis (chll). Compared with the Dictyochophyceae plas- 
tids, six genes are lacking in all of the Meringosphaera plastids 
(Figure 4A). These missing genes span a range of functions: ri- 
bosomal proteins (rol22 and rpl4), protein translocase (secA), 
iron-sulfur cluster assembly (sufB), and two uncharacterized 
genes (ycf39 and ycf66). All of these genes have been lost 
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Figure 4. Comparison of the gene repertoires of the Meringosphaera plastid genomes with free-living Dictyochophyceae 

(A) Presence/absence of annotated genes between the six complete SAG plastids and four free-living photosynthetic Dictyochophyceae. Only genes that are not 
present in all sampled genomes are shown. The full set of plastid genes is available in Figure S4. The background color of the squares highlights the identity of the 
plastids (pale green, group 1 SAGs; dark green, group 2 SAGs; and pale blue, representative Dictyochophyceae). 

(B) Bar plot showing the predicted function of the annotated genes, with color denoting the functional group as described in the legend. The same genomes are 
compared as in (A), and the color circle below the genome name highlights the identity using the same color scheme as for (A). 


from the plastid genome of other algae, for example, rp/22 has 
been lost within the Alveolates,°° rp/4 and secA lost in Pelago- 
phyceae,”' sufB lost in Pteridomonas”” and transferred to nu- 
cleus in green algae,”” ycf39 lost in Eustigmatophytes and 
Ochromonas,°* and ycf66 lost in some diatoms.°° In addition, 
none of the plastids found in Meringosphaera have any of the 
three amino acid biosynthesis genes ilvB, ilvH, and serC, 
whereas the known Dictyochophyceae plastids have at least 
one copy. Furthermore, one gene, orf119, is present in all of 


the Meringosphaera plastids but not in Dictyochophyceae and 
encodes a 50s ribosomal L22 like-protein. It is potentially a de- 
rivative of the rp/22 gene that is missing in all the Meringos- 
phaera plastids. The ten genes with a shared pattern across 
the Meringosphaera plastids compared with the free-living Dic- 
tyochophyceae (i.e., rpl22, rpl4, secA, sufB, ycf39, ycf66, ilvB, 
ilvG, serC, and orf1 19), indicate a level of convergence in the pu- 
tative Meringosphaera kleptoplasts since plastid groups 1 and 2 
are separated by free-living species. 
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Figure 5. Predicted candidates for host-encoded plastid-associated proteins in the co-assemblies 

Each column shows the data for one of the co-assemblies, and each row represents one of the host-encoded plastid-associated proteins, which have been 
grouped by their function. First, there are the 31 known host-encoded plastid-associated proteins that we searched for, and at the bottom, the three metabolic 
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centrohelid identity, and gray represents a mixed identity that includes centrohelids among others. The small box indicates the predicted targeting signal: dark 
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The gene content within the two main plastid groups of Merin- 
gosphaera also showed patterns of gene specificity (Figures 4A 
and S4). Group 1 has five genes not found in group 2: acpP, which 
encodes a cofactor of fatty acid synthesis; the accessory subunit 
of PSI, psaE; the plastid-encoded member of the TIC20 family, 
ycf60; and two genes of unknown function (orf271 and ycf19). 
The presence of the plastid copy of the TIC20 transporter, 
ycf60, within group 1 is particularly interesting because this could 
be part of a protein import machinery into the plastid. TIC20 has 
been found to be capable of forming an independent channel in 
the chloroplast inner membrane that does not require other TIC 
proteins to function.°° By contrast, group 2 had one specific 
gene, trnR a tRNA-arginine, that is involved in translation. 


Host-encoded plastid-associated genes 

To determine whether kleptoplastidy in Meringosphaera shows 
evidence of host genetic integration, we searched for host-en- 
coded genes of plastid-associated pathways. The 15 individual 
SAGs were co-assembled based on a 99% 18S rDNA identity 
threshold to improve genome coverage and to help address 
the biases of the multiple displacement amplification (MDA) pro- 
cess (Table S1; the STAR Methods detail which SAGs were co- 
assembled together). Both SAGs with and without plastids were 
co-assembled. This led to the assembly of three COSAGs 
(COSAG1, COSAG2a, and COSAG2b): COSAG1 belongs to 
group 1 of the 18S rDNA phylogeny, and COSAG2a and 
COSAG2b belong to group 2. As expected, the co-assemblies 
had higher completeness than the individual SAGs (Table S2). 
We searched for 31 well-characterized nuclear-encoded pro- 
teins that function in key plastid pathways even in non-photosyn- 
thetic species (e.g., isoprenoid synthesis, iron-sulfur cluster 
biosynthesis, and protein import).'°’ In addition, we searched 
for 35 metabolic transporters that have been previously identi- 
fied as kleptoplast-targeted.° For each protein candidate we 
(1) identified its genomic context to ensure that it was within a 
Meringosphaera contig and not a contaminant, (2) identified 
the phylogenetic origin of the candidate protein, and (3) pre- 
dicted the subcellular localization by searching for a plastid-tar- 
geted signal (see STAR Methods for further details). For the 
metabolic transporters, only candidates with predicted plastid 
targeting were kept to avoid the inclusion of homologous copies 
functioning in other cellular compartments. 

In total, we found 22 candidates corresponding to Meringos- 
phaera host-encoded plastid-associated proteins (Figure 5; 
summarized in Table S3). These proteins were found across 
the three COSAGs and across a diverse range of functions. 14 
of the 22 candidates were predicted to have a plastid-targeting 
signal with a clear N-terminal extension (Figure S5; Table S3). 
Four homologs to metabolic transporters were predicted, corre- 
sponding to three different transporters. Of particular interest is 
PLT4, a probable sugar transporter predicted in COSAG2a, 
which could function to translocate fixed carbon into the cytosol. 
Dinoflagellates and stramenopiles were the main phylogenetic 
origins of the plastid-targeted proteins (six proteins were 
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predicted as dinoflagellate in origin and nine proteins as strame- 
nopile). Since Dictyochophyceae are stramenopiles, these nine 
proteins could have originated from EGT, although we could 
not determine a more precise origin to a specific stramenopile 
group. It is unclear why six proteins have a dinoflagellate origin, 
but their genes could have arisen from HGT from other food 
sources or via a hypothetical second kleptoplast during plastid 
switching. 

Some of these plastid-associated proteins are known to be 
encoded in either the plastid or nuclear genome in various spe- 
cies, so these need to be considered alongside the plastid 
genome content. For example, the iron-sulfur cluster assembly 
genes sufB and sufC are typically encoded on the plastid 
genome, whereas the remaining genes of the pathway are nu- 
clear-encoded.”” The plastid genomes of the SAGs contained 
sufC but not sufB (Figures 4 and S4), but both COSAG1 and 
COSAG2b had a nuclear candidate sufB gene. Similarly, 
although the group 1 plastid genomes encoded the plastid 
TIC20-homolog (ycf60), the nuclear TIC20 candidate was only 
found in group 2 (COSAG2a). It remains to be confirmed if these 
patterns of complementarity hold with more complete data, but 
both of these examples suggest that host-encoded proteins can 
substitute missing plastid-encoded functions. 


DISCUSSION 


Meringosphaera is a poorly studied marine genus that can reach 
high abundance and has been known for a century to harbor 
photosynthetic green bodies. Here we identified these 
commonly observed bodies and investigated the nature of their 
partnership. Our analyses based on single-cell genomic data 
and microscopy found no evidence of endosymbiont nuclei, 
which suggests that the green bodies are isolated plastids and 
not whole endosymbiotic cells. The analysis of the SAG data re- 
vealed two main groups of Meringosphaera harboring different 
Dictyochophyceae plastids. Our monthly environmental moni- 
toring spanning over a year uncovered seasonal dynamics of 
both host and plastid. In particular, the group 2 plastid was not 
detected between January and June, which we hypothesize is 
due to plastid switching based on the observation by fluores- 
cence of plastids in Meringosphaera cells collected during this 
time (Figure 2). The lack of plastid detection combined with the 
lack of plastid markers in four of the SAGs, suggest that the plas- 
tids in Meringosphaera are kleptoplasts stolen from Dictyocho- 
phyceae prey. Dictyochophyceae are photosynthetic strameno- 
piles with red algal-derived plastids.*° There are only two 
previous reports of symbiotic Dictyochophyceae: (1) as klepto- 
plasts found in the dinoflagellate host Dinophysis mitra (also 
known as Phalacroma mitra),°° and (2) in the dinoflagellate Podo- 
lampas bipes®° where endosymbiotic cells are apparently verti- 
cally transmitted to daughter cells. Unfortunately, there are no 
plastid genomes from either of these examples, so we cannot 
compare Meringosphaera kleptoplast genomes to other dictyo- 
chophyte symbionts. 


green indicates predicted plastid targeting, gray indicates predicted targeting to other cellular compartments, and a cross is shown when the N terminus was 
incomplete and targeting could not be predicted (see Table S3 and Figure S5). The central box shows the phylogenetic origin of the candidate protein, with colors 
representing different groups. For the metabolic transporters, only those with predicted plastid targeting were kept. If a plastid copy of the gene was found, a 


green circle is shown. 
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The presence of kleptoplasts in Meringosphaera is consistent 
with historical observations that have reported cells both with 
photosynthetic bodies*®:**:** but also more rarely without.® "°? 
Furthermore, these few available microscopic observations re- 
ported contrasting colors of the Meringosphaera photosynthetic 
bodies, including green*”°* and golden,** which is consistent 
with the presence of a cryptic diversity of hosts that harbor 
different kinds of kleptoplasts or with plastid switching. Seasonal 
dynamics within kleptoplastidy similar to the type we report here 
have been documented before. For example, temporal changes 
were found in the identity of the green algal kleptoplasts within 
the sea slug Plakobranchus ocellatus.™ The ciliate Mesodinium 
spp. was also found to switch between red and green-pig- 
mented cryptophyte plastids throughout the year.°° However, 
it has not yet been demonstrated whether plastid switching in 
kleptoplastidy can actively select the “optimum” plastid for a 
set of given conditions or whether it is always passive and re- 
sponds to prey availability. The latter would be akin to the dis- 
covery that symbiont biogeography dictates the association in 
some marine photosymbioses.°° 

The plastid genomes of Meringosphaera varied in both their 
recovered integrity and gene content. The variation in integrity 
could be an artifact of the MDA process, which is known to 
lead to uneven amplification and coverage that can prevent 
proper assembly.°’ Alternatively, the variation might reflect 
different stages of kleptoplastidy as the plastids eventually 
degrade without the required nuclear support.” This could 
explain the four SAGs with no plastid sequences, which might 
have been the oldest cells containing highly degraded plastids 
with insufficient intact DNA for the MDA process to recover. 
Among the differences in gene content, the most striking was 
the presence of the plastid homolog of T/C20 within the group 
1 plastids. Intriguingly, a host-encoded T/C20 gene was identi- 
fied in COSAG2a. Combined, this indicates that two of the three 
co-assembly groups of Meringosphaera have a possible protein 
import channel that has been demonstrated to permit some pro- 
tein import without other subunits.°° 

The predicted host-encoded plastid-associated proteins in 
Meringosphaera are one of the few examples of putative genetic 
integration shown in kleptoplastidy.?':2*7° The candidates for 
the host-encoded plastid associated-proteins were part of 
different pathways, but no pathway was completely recovered 
in our analysis of plastid pathways. This could be indicative of 
mosaic pathways, where enzymatic steps occur in different 
cellular compartments. This has, for example, been found previ- 
ously in the heme synthesis pathway of some dinoflagellates and 
apicomplexans, where the first step utilized the mitochondrial 
version of the 5-aminolevulinic acid synthase (ALAS) enzyme 
but the later steps with the hemB-E enzymes occurred sepa- 
rately in the plastid.°° It is possible that some of the pathways 
within Meringosphaera are similarly mosaic and that some 
stages occur in other cellular compartments. Alternatively, these 
pathways might be incomplete, which would contribute to the 
eventual degradation of the kleptoplasts, or be due to the partial 
nature of the SAGs. 

We do not yet know whether the host-encoded plastid-asso- 
ciated proteins in Meringosphaera are successfully targeted to 
the kleptoplasts where they function. Nonetheless, finding 
host-encoded genes putatively associated with kleptoplasts is 
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significant for two main reasons. First, it shows that low levels 
of EGT and possibly HGT may take place in a fluctuating klepto- 
plastidy where the kleptoplasts are sourced from different preys. 
By contrast, Rapaza viridis and the dinoflagellate hosts with klep- 
toplast-targeted proteins have remarkably stable kleptoplastidy 
(over 30 months without renewal in RSD), and only one special- 
ized prey species is known.?':2*?° Secondly, unlike in dinoflagel- 
lates but as in R. viridis, centrohelids are not ancestrally plastid- 
bearing, and therefore, kleptoplast-associated gene transfers 
occurred in a naive genome with no pre-adaptations to plastid 
hosting. These observations suggest that low-level protein 
import can occur early-on as a mechanism to regulate plastid 
retention in kleptoplastidy. The finding of protein import into 
the kleptoplasts of Weringosphaera is congruent with the predic- 
tions of the shopping bag model of plastid origin, where serial 
uptake of foreign genes from food facilitated the eventual fixation 
of a plastid.” Furthermore, the predicted kleptoplast-targeted 
metabolic transporters are consistent with the targeting-ratchet 
model,°° which hypothesizes that transporters set up an evolu- 
tionary ratchet for plastid fixation. Overall, this work supports 
the hypothesis that protein import is a relatively early event 
that helps to stabilize plastids,°*-°° and as such is a mechanism, 
rather than a consequence, of plastid establishment. 

In conclusion, Meringosphaera offers an exciting opportunity 
to examine kleptoplastidy within centrohelids and so provides 
new insights, from a relatively under-examined eukaryotic group, 
into this important process for the evolution of photosymbiosis. 
Future work needs to identify the second putative kleptoplast 
in Meringosphaera, after which it could be used as a model to 
investigate plastid switching. Moreover, as a globally distributed 
and at times highly abundant species, the study of Meringos- 
phaera is also potentially important for the marine ecosystems 
in which it plays a role. This study into the little-known centro- 
helid Meringosphaera demonstrates the insights gained into 
important evolutionary transitions by continuing to explore the 
broad range of eukaryotic diversity. 
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Chemicals, peptides, and recombinant proteins 
Paraformaldehyde 

SeakKem LE agarose 

Lysozyme 

ProLong Diamond Antifade mountant 


Proteinase K 
TaqMan buffer 


Formamide 

Critical commercial assays 

REPLI-g UltraFast Mini kit 

ExoProStar 1-Step Kit 

DNeasy Plant kit 
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5’-CATATGCTTGTCTCAAAGATTAAGCCA-3’ 


5’-CACACTTACWAGGAYTTCCTCGTTSAAGACG-3’ 


Software and algorithms 
TrimGalore v0.6.1 


Bbnorm (bbmap v38.08) 
SPAdes v3.13.1 


blast 2.10.1+ 
MAFFT v7.407 
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Catalog number: 1.09684.1000 


Catalog number: 150033 
Catalog number: US77702 
Catalog number: 69104 


Accession numbers: 0Q075975 to OQ075989 


Accession numbers: 
OR195151 to OR195157 & 
OR196762 to OR196769 


Accession numbers: 0Q091774 to 0Q091781 
Accession numbers: OQ078560 to OQ078568 
Accession numbers: OQ078569 to OQ078579 
Accession numbers: 0Q161668 to OQ161673 


under BioProject PRJNA917255, accession 
numbers SAMN32532880 to SAMN32532894 


https://doi.org/10.6084/m9.figshare.c.6313464 
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Primer name: Helio1979R 
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https://github.com/BiolnfoTools/BBMap 
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REAGENT or RESOURCE SOURCE IDENTIFIER 

trimAl v1.4.1 Capella-Gutiérrez et al.“ https://github.com/inab/trimal 

1Q-TREE v1.6.5 Minh et al.’° http://www. iqtree.org/ 

ModelFinder Kalyaanamoorthy et al.’° http://www. iqtree.org/ModelFinder/ 

RAXML v. 8.2.12 Stamatakis’” https://cme.h-its.org/exelixis/web/software/raxml/ 

GetOrganelle v1.7.3.3 Jin et al.”® https://github.com/Kinggerm/GetOrganelle 

MFannot NA https://megasun.bch.umontreal.ca/ 
cgi-bin/mfannot/mfannotinterface.pl 

OGDraw Greiner et al.’° https://chlorobox.mpimp-golm.mpg.de/ 
OGDraw.html 

R v.4.2.0 R Core Team”? https://cran.r-project.org/ 

RStudio RStudio Team”?! https://posit.co/download/rstudio-desktop/ 

BUSCO v5.3.1 Simão et al.” https://busco.ezlab.org/busco_userguide.html 

Prodigal v2.6.3 Hyatt et al.°° https://github.com/hyattpd/Prodigal 

SequenceServer 2.0.0 Priyam et al.°* https://sequenceserver.com/ 

BWA v0.7.8 Li and Durbin?’ https://bio-bwa.sourceforge.net/ 

IGV v2.4.2 Thorvaldsdottir et al.°° https://software.broadinstitute.org/ 
software/igv/home 

DEEPLOC 2.0 Thumuluri et al.°” https://services.healthtech.dtu.dk/ 
services/DeepLoc-2.0/ 

TargetP 2.0 Almagro Armenteros et al.°° https://services.healthtech.dtu.dk/ 
services/TargetP-2.0/ 

Other 

Alexa fluor 488 Thermo Fisher Scientific Catalog number: A20000 

47 mm diameter nominal pore VWR Catalog number: 516-0813 


size (5-15 um) paper membrane filter 


47mm-diameter 5um polycarbonate 
hydrophilic membrane filters 


Sigma Aldrich. (Millipore) Catalog number: TMTP04700 


RESOURCE AVAILABILITY 


Lead contact 
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Megan 
Sørensen (Megan.Sorensen@hhu.de). 


Materials availability 
This study did not generate new unique reagents. 


Data and code availability 
The identified genetic sequences have been deposited at GenBank and are publicly available as of the date of publication (Veringos- 
phaera 18S rDNA sequences GenBank:0Q075975 to OQ075989, Meringosphaera 28S rDNA sequences GenBank: OR195151 to 
OR195157 and OR196762 to OR196769, Meringosphaera plastid 16S rDNA sequences GenBank: 0Q091774 to 0Q091781, plastid 
psbA sequences GenBank: OQ078560 to OQ078568, plastid rbcL sequences GenBank: OQ078569 to 0Q078579, and the complete 
plastid genomes GenBank: 0Q161668 to OQ161673). The raw reads data have been deposited at NCBI Sequence Read Archive: 
BioProject PRJUNA917255, accession numbers: SAMN32532880 to SAMN32532894. All data files are available at Figshare https:// 
doi.org/10.6084/m9.figshare.c.6313464, this includes all the plastid contigs (both complete and incomplete), the host-encoded 
plastid-associated protein candidates, single gene trees, the assembled reads of the SAGs and COSAGS, and the qPCR results. 
All custom scripts used in this study are publicly available at GitHub: https://github.com/MeganSorensen/Meringosphaera_SAGs. 
Any additional information required to reanalyse the data reported in this paper is available from the lead contact upon request. 


EXPERIMENTAL MODEL AND SUBJECT DETAILS 


Meringosphaera cells were isolated from environmental samples by searching for characteristic Meringosphaera morphology: up to 
13 undulating spines typically 16-25 um long from a spherical body of 4-9 um in cell diameter containing up to six green/yellow bodies 
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and the cells are non-motile. The sampling strategy is detailed below. Once isolated, the cells were processed immediately and were 
not maintained in the lab. 


Sampling 

Samples were collected from surface waters at three stations off the West coast of Sweden: Anholt East (56°40’00”’ N, 12°07’00”’ E), 
A17 (58°16'30”N, 10°30’48”E) and Slaggo (58°15’30” N, 11°26’00” E) routinely sampled by the Swedish Meteorological and Hydro- 
logical Institute (SMHI) on the Research Vessel (R/V) Svea. On one occasion, October 2020, the A17 sampling was replaced by sam- 
pling at a nearby station, A16 (58°16’00” N, 10°43’30” E); which is 12 km away from A17. Whole water samples were collected from 
the surface by bucket. The salinity is around 20 practical salinity units (PSU) at Anholt E, 31 PSU at A17 and 24 PSU at Slaggo. Surface 
water temperature at these locations typically ranges from ~0°C to 20°C over the year. Basic environmental data (temperature, 
salinity, dissolved nutrient concentrations) was taken from the SMHI website (https://sharkweb.smhi.se/hamta-data) corresponding 
to the dates and locations of our samplings (Figure S3). The samples were first stored on board the ship in the dark and then trans- 
ported to the laboratory in an insulated, opaque container with icepacks; the total time between collection and arrival at the laboratory 
varied between 3 — 12 days. 


Culturing attempts 

We made several attempts to culture Meringosphaera but none of these were successful. These attempts included the inoculation of 
1-10 Meringosphaera cells in 40 mm Petri dishes with 33 ppt artificial seawater or filter-sterilised water from the original habitat with or 
without additional nutrient media (soil extract, 0.025% cerophyl extract) at 4°C with or without light. Some dishes were separately 
cultured with Neobodo flagellates or the mixed protist community from original sample was added as potential food. The growth 
of Meringosphaera was never observed in any of these conditions and alive cells were not found in the original samples after two 
weeks. 


Environmental Dictyochophyceae 

The environmental diversity of Dictyochophyceae is an important factor for the Meringosphaera kleptoplastidy. We do not unfortu- 
nately have our own data of this, but we have performed searches for environmental Dictyochophyceae using the resources 
available: 

First, regarding the Dictyochophyceae diversity in the environment at sampling sites, we have looked at the phytoplankton data 
taken by the SMHI monitoring program from the same stations we used (available at https://sharkweb.smhi.se/hamta-data). From 
this data, 4 Dictyochophyceae species have been identified by microscopy (Apedinella radians, Dictyocha fibula, Dictyocha specu- 
lum and Pseudopedinella pyriformis) and there were 3 additional genus-level identifications (Dictyochales, Pseudochattonella and 
Pseudopedinella). So there appears to be a range of Dictyochophyceae present, with 2 of the 4 known orders being identified. Un- 
fortunately, the available data is abundance only and not sequence data. 

Secondly, we have checked the Tara Oceans databases®**:°° using the 16S rDNA plastid sequences as queries. Here there were 
some hits with 90-98.5% similarity. The upper part of this range is higher than the similarity to the known Dictyochophyceae species. 
Given that none of Tara stations are close to our sampling locations, a lack of exact matches could be due to environmental variation, 
and the top matches could represent either the free-living prey or additional Meringosphaera plastids. In addition, there are three 
environmental sequences that cluster with the Meringosphaera plastids in the 16S rDNA phylogeny, but again we do not know if these 
represent the free-living prey or not. 


METHOD DETAILS 


Single cell isolations & MDA 

The samples for single cell isolation were collected in October and November 2018 from the West coast of Sweden (stations A17 and 
Anholt E; Table S1 lists which SAGs were sampled from which stations). In the laboratory they were gravity filtered onto a 47 mm 
diameter nominal pore size (6-15 um) paper membrane filter (VWR, Radnor, Pennsylvania, USA; Cat No. 516-0813) held in a Millipore 
filtration tower to avoid damaging the cells. The filters were washed in a 60 mm diameter plastic Petri dish with 10 ml of seawater. The 
dishes were scanned for characteristic Meringosphaera morphology (up to 13 undulating spines typically 16-25 um long from a 
spherical body of 4-9 um in cell diameter containing up to six green/yellow bodies and the cells are not motile) using a 40X objective 
of a Nikon Eclipse Ts2R inverted microscope, equipped with phase contrast. Single cells that appeared to contain green/yellow 
bodies were isolated using an Eppendorf TransferMan 4r micromanipulator and pulled glass pipettes. The cells were passed through 
droplets of minimal seawater to reduce contamination and frozen in 200ul PCR tubes with minimal seawater (e.g., <5 ul). Frozen sin- 
gle cells in PCR tubes were thawed and subjected to lysis and multiple displacement amplification (MDA) using the REPLI-g UltraFast 
Mini kit (Qiagen, Hilden, Germany) following the manufacturer’s instructions. The MDA samples were initially screened for the pres- 
ence of Meringosphaera 18S rDNA with the centrohelid-specific primers Thx25F (6’-CATATGCTTGTCTCAAAGAT TAAGCCA-3’) and 
Helio1979R (5’°-CACACTTACWAGGAYTTCCTCGTTSAAGACG-3’) in a PCR reaction.°° After gel electrophoresis, if a band was pre- 
sent the PCR products were purified with ExoProStar 1-Step Kit (GE Healthcare; US77702) and Sanger-sequenced directly at Macro- 
gen Europe. From this screening, 15 MDA samples containing Meringosphaera 18S rDNA were selected for the next steps. 
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Sequencing 

The library preparations and sequencing were performed by SciLifeLab National Genomics Infrastructure (NGI), Stockholm, Sweden. 
Sequencing libraries were prepared with the TruSeq PCR-free library preparation, targeting an insert size of 350bp. Sequencing was 
performed in two batches: Batch 1 (became SAGs S1-4) were multiplexed on 1 lane and sequenced on Illumina NextSeq500 with 
paired-end ‘Mid-output’ chemistry; batch 2 (became SAGs S5-15) were multiplexed on 2 lanes and sequenced on Illumina 
NovaSeq6000 with paired-end ’NovaSeqStandard’ workflow in ’SP’ mode flowcell. 


SAG assembly 

The 15 datasets were trimmed using TrimGalore v0.6.1 (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) with 
default parameters. Then normalised with bbnorm (bbmap v38.08”°) with a minimum coverage value of 5 and a target value of 
100 to help account for the biases introduced by MDA. The normalised reads were assembled into contigs with SPAdes 
(v3.13.1”') in careful mode (spades.py -careful -k auto). Basic parameters of the assemblies are listed in Table S2. 

The contigs were made searchable as local databases with makeblastdb (blast 2.10.1+’°) and searched with blastn. For the spe- 
cific Meringosphaera 18S rDNA search GenBank accession MZ240752 was used as the query. For the general 18S rDNA search, a 
custom dataset containing diverse 18S rDNA sequences was used as the query (Figshare dataset D1.1). In the general search there 
were additional green algae picoeukaryote 18S rDNA identified in 2 of the SAGs (S5 and S7), but due to the lack of consistency, both 
across the samples and with the plastid identity, we believe this is contamination. For S7, we believe the degree of contamination was 
sufficient to increase the BUSCO completeness score (see Table S2). 

We chose to use the 16S rDNA, psbA & rbcL genes as the plastid markers because they maximised the number of the SAGs with a 
plastid copy and are commonly sampled genes, meaning that environmental references could be included. For these plastid markers 
a custom dataset was used for each containing diverse plastid sequences (16S rDNA Figshare datasets D1.2, psbA D1.3, rbcL D1.4). 
The hits from the blast searches were aligned with MAFFT v7.407 (mafft -auto -adjustdirectionaccurately -reorder)’° and trimmed 
with trimAl v1.4.1’* (trimal -gappyout -fasta). For the most part, phylogenetic trees were made with IQ-TREE v1.6.5”° with 
ModelFinder’® to determine the best-fit model (for the 18S rDNA tree the TN+F+I+G model was chosen, for the plastid 16S rDNA 
the TIM3+F+l+G4 model, for psbA the GTR+F+G4 model and for rocL the GTR+F+Il+G4 model). Support values are from 1000 ultra- 
fast bootstrap replicates and SH-aLRT test (iqtree -m TEST -bb 1000 -alrt 1000). However, the phylogenetic trees of the concate- 
nated 18S and 28S rDNA (Figure 1A) sequences and the separate 28S rDNA phylogeny (Figshare D2.1B) were reconstructed with 
RAxML v. 8.2.12’’ GTR models with 4 gamma categories and support values are from 1000 rapid bootstrap replicates. 

The length of the Meringosphaera sequences used in the phylogenetic analyses are as follows: The Meringosphaera plastid 16S 
rDNA sequences (Figure 1B) were 1470-1480 bp in length. The Meringosphaera 18S rDNA sequences (Figshare D2.1A) were 1710- 
1720 bp in length. The complete Meringosphaera psbA sequences (Figure S1A) were 1059-1082 bp in length, but SAG S12 was 
incomplete and only 443 bp in length. The complete Meringosphaera rbcL sequences (Figure S1B) were 1458-1466 bp in length, 
but SAG S7 and S15 were incomplete and only 553 bp and 844 bp in length respectively. The Meringosphaera 28S rDNA sequences 
(Figshare D2.1B) were 2292 bp in length. The concatenated 18S and 28S rDNA sequences (for Figure 1A) were 3850 bp in length, and 
in order to get sufficient number of reference centrohelid 18S and 28S sequences from the same sample we used the long read OTU 
sequences generated by Jamy et al.” 


Plastid genome assembly & comparisons 

The plastid genomes were predicted and assembled with GetOrganelle v1.7.3.3’° with the other pt database (get_organelle_from_- 
reads. py -F other pt-R 15 -k 21,45,65,85,105). In addition, GetOrganelle was performed with the seed option (-s) using the identified 
SAG S4 plastid genome to assist with plastid identification in the other SAGs. Six plastid genomes were assembled as circular by 
GetOrganelle and are predicted as ‘complete’ (S2, S3, S4, S6, S8, and S14). We did not verify them with PCR. Plastid contigs 
were annotated with MFannot (https://megasun.bch.umontreal.ca/cgi-bin/mfannot/mfannotinterface.pl) and were drawn with OG- 
Draw.’° The MFannot predictions of absent plastid genes within Meringosphaera plastids were confirmed by manual verification us- 
ing Blast. For the plastid genome comparisons, the annotated SAG plastid genomes were compared to the four published photosyn- 
thetic Dictyochophyceae plastid genomes available at the time (GenBank accession numbers: MK518352, MK518353, MK561359 & 
MK561360). The analysis was conducted and visualised with R v.4.2.0°° in RStudio.°" 


Co-assemblies & host-encoded plastid associated genes 
The MDA process is known to produce uneven coverage that can lead to errors in assembly, meaning that gene absences at the SAG 
level could be either biological or an artefact. To help address this, and to increase the coverage overall, we formed co-assemblies for 
the investigations of the host genes. First, the plastid sequences were removed from the SAGS, and the remaining reads were clus- 
tered based on >99% Meringosphaera 18S rDNA. Within these clusters the reads were co-assembled in the same way as described 
above for the single assemblies. This created three co-assemblies: COSAG1 (from SAG S2,3 & 6), COSAG2a (SAG S10,13), & 
COSAG2b (SAG S1,4,5,7,8,9,11,12,14,15). In this way, the co-assemblies included SAGs both with and without plastids. The 
completeness of the SAGs and COSAGS was assessed with BUSCO v5.3.1°° in genome mode with the eukaryote database of 
255 markers (Table S2). Basic parameters of the co-assemblies are listed in Table S2. 

Open reading frames (ORFs) and amino acid sequences were predicted with Prodigal v2.6. using default parameters. The 
amino acid sequences were made searchable as local blast databases (as above). These were searched for the 31 target proteins 
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using reference databases across diverse taxa as the queries (databases from Schön et al. °’: Figshare Collection. https://doi.org/10. 
6084/m9.figshare.c.5388176.v3, list of proteins also based on Hehenberger et al.”'). In addition, we also searched for 35 metabolic 
transporters found to be kleptoplast-targeted in the euglenozoan Rapaza viridis.°°The resulting homologs were aligned with the refer- 
ence database and IQ trees were made according to the method described above. The trees were manually inspected to identify any 
homologs that grouped with known plastid-bearing species. The identity of the up- and downstream proteins surrounding these can- 
didates were then assessed. The surrounding proteins were run through three different databases: Blastp tsa_nr mode (Transcrip- 
tome Shotgun Assembly proteins), Blastp nr mode (non-redundant protein sequences), and SequenceServer 2.0.0?” with the 
EukProt V3 database” selected taxonomic groups Centroplashelida, Haptophyta and Dictyochophyceae. The top 20 hits were 
taken from each of these searches and IQ trees were made of the alignments; from these trees the identity of the contigs was estab- 
lished. Only candidates whose up- and downstream proteins had possible centrohelid identity were kept. Coverage was checked 
manually across the candidate contigs to check for mis-assemblies, using BWA v0.7.8°° and visualised with IGV v2.4.2.°° Each 
candidate was manually checked for completeness, using alignments to a variety of homologs, and particular care was taken 
with the N and C termini. The completeness of each candidate is noted in Table S3, along with a general summary of each candidate. 
Only if the N-terminus was complete did we predict the subcellular localisation with DEEPLOC 2.0°’ and TargetP 2.0.°° For the trans- 
porters, only candidates with predicted plastid-targeting were kept because often homologous transporters function in different 
cellular compartments. Candidate proteins that had been split by Prodigal due to the presence of introns were concatenated together 
manually and when this was necessary it is noted in Table S3. 

In addition, we searched for the proteins encoded by genes identified as plastid-to-nucleus transferred within a Dictyochophyceae 
species.*° Homologs of these nine proteins (acpP, ilvB, petF, psb28, rol12, rpl32, syfB, ycf35 and ycf42) were not found in any of the 
Meringosphaera co-assemblies. However, some of these were still within the plastid genomes: in particular, acpP was found in the 
group 1 plastids, and rp/12 and rp/32 were found in all of the SAG plastids (see Figures 4A and 54). 


CARD-FISH 

Water samples (0.5L) were filtered on 5um 47mm-diameter polycarbonate hydrophilic membrane filters (Whatman) by gravity filtra- 
tion. The filters were fixed in the dark at room temperature in 2ml of 4% paraformaldehyde for 1 hour. The filters were then rinsed three 
times with 0.2um-filtered sea water and stored frozen at -20°C. 

The CARD-FISH procedure was performed on these filters at a later date (a few months after collection). First, the filters were 
embedded in 0.2% low melting point agarose. Next, an initial permeabilization step of incubation in 10mg mI" lysozyme solution 
for 1 hour at 35°C, followed by a MilliQ wash. The rest of the procedure followed the protocol provided by Piwosz et al.°° Briefly, 
the filters were then moved to 0.01M HCL solution for a 20-minute incubation at room temperature, before being washed in PBS 
and DI water. Hybridisation occurred at 35°C for ~20 hours, for which the 50ng/ul probe solution was mixed with the appropriate 
hybridisation buffer (see Piwosz et al.°°) in a 1:99 dilution, producing a final probe concentration of 0.5ng/1I. The formamide percent- 
age in the hybridisation buffer was optimised per probe, and the final percentage used is given in Table S4. The filters were then 
washed at 37°C for 30 minutes, and the concentration of NaCl in the wash buffer was determined by the formamide percentage 
of the hybridisation buffer. The filters were incubated in 0.01% PBS-Triton for 45 minutes at 37°C. They were then incubated at 
37°C in the dark with the amplification buffer and fluorochrome-labelled tyramide solution for the CARD step. Alexa fluor 488 was 
used in this process. The filters were then incubated in 0.01% PBS-Triton for 15 minutes in the dark at 37°C. The final wash occurred 
in MilliQ, and lastly ethanol (100%). The filters were then air-dried. The filters were mounted onto slides with 1pg ml”! DAPI in 
ProLong™ Diamond Antifade mountant (Invitrogen™). The slides were dried for 24 hours in the dark at room temperature and stored 
frozen at -20°C until visualization with a confocal microscope. 

The CARD-FISH probes used in this experiment are listed in Table S4. One of which was designed for this project, Mer482, and it 
targets both group 1 and 2 Meringosphaera 18S rRNA (though within the April 2021 CARD-FISH samples only group 2 hosts are pre- 
sent). Mer482 probe was designed with the help of oligoN-design v0.1.0 pipeline?” and then was checked for specificity with the Silva 
18S rRNA Arb database.°° Probe function was predicted by MathFISH,°®9” and the predicted formamide concentration was used as 
the starting point. 

Each sample filter was cut into 16 equal slices (of ~1.1cm?) in order that the controls could be applied to the same sample. In this 
way, every CARD-FISH experiment included both a negative and positive control. In the negative control the Mer482 specific HRP 
probe for Meringosphaera was absent but the Alexa 488 dye was added to check for non-specific binding. In the positive control a 
general eukaryote probe EUK1209R (EUK1195) was used instead of the Meringosphaera probe so to ensure that the CARD-FISH 
procedure worked optimally. 

Images were taken on inverted confocal microscope Zeiss LSM 780 (Zeiss Berlin Germany) using the 405, 488 and 633nm lasers 
with the 63 x/1.4 oil objective lens. The ZEN black 2011 software was used to control the system and process the images. The same 
gain and intensity per channel were used when comparing between all the sample conditions and across the controls. 


Environmental monitoring with qPCR 

Surface water samples (0.5-1L) were filtered onto a 47mm diameter Sum filter (Whatman) by gravity filtration or onto a 25mm diameter 
Sum filter (Whatman) held in 25-mm diameter Swinnex filter holder (Millipore Billerca MA USA) using a peristaltic pump (Cole-Parmer, 
Vernon Hills, IL USA). We aimed to filter 1L of water per replicate and to have 3 replicates per sample location when possible. There 
were some differences in which of the three locations were sampled per month, this information can be seen in Figures S2 that splits 
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the qPCR data by sampling location. The filters were snap frozen in liquid nitrogen and stored at -80°C. DNA was extracted from the 
filters with the DNeasy Plant kit (Qiagen, Hilden, Germany) with the following modifications to the manufacturer’s protocol: Tubes 
containing the 400.1 AP1 buffer and the filter were subject to three freeze/thaw cycles (alternating from submerged in liquid nitrogen 
to 65°C). Glass beads (0.65g of a 50:50 mix of 2.8mm and 1.4mm Zirconium oxide beads (Precellys)) were added to the tubes, and 
then these were subjected to a 2-minute bead beat step (at 8500 rpm). 45ul Proteinase K was added and the samples were incubated 
for 1 hour at 55°C. The rest of the procedure followed the Qiagen protocol starting with the addition of RNase A. The final elution 
volume was 30ul. Extracted DNA was stored at -20°C. 

qPCR was performed using the TaqMan chemistry (Applied Biosystems) and using a Step One Plus Real-Time PCR system 
(Applied Biosystems). The qPCR program was: 50°C for 2 min, 95°C for 10 min and 45 cycles of 95°C for 15 secs followed by the 
Tm temperature for 1 min (the Tm temperatures per probe are listed in Table 54). The reaction volume was 25, per well, containing: 
12.511 2X TaqMan buffer (Applied Biosystems), 8,1 nuclease-free water, 0.4umol L”! forward primer, 0.4umol L` reverse primer, 
0.2umol L” probe & 2l of template DNA). The target-specific TaqMAN probe was 5’ labelled with a fluorescent reporter FAM (6-car- 
boxyfluoesceom) or VIC (2/-chloro-7’phenyl-1 ,4-dichloro-6-carboxy-fluorescein) (see Table S4 for details per probe) and 3’labelled 
with TAMRA (6-caroboxytetramethylrhomadine) as a quenching dye. On every plate, standards for an 8-point standard curve were 
included and were run in duplicates. The standard curve was made from a 10-fold dilution series ranging from 10? to 10! gene copies 
per reaction using synthesized Gblocks (IDT) of the target region. Every plate also included 3 wells for the negative controls, where 
nuclease-free water was used instead of template DNA. Each sample was run with 2 or 3 technical replicates. If the target could not 
be detected in all of the technical replicates the sample was marked as detected but not quantifiable and were plotted with a value of 
1.1. Gene copy numbers were calculated from the mean cycle threshold (Ct) value of the technical replicates, this value was then 
quantified into copy number using the standard curve. 

The qPCR primers and probes in this study were designed using the Primer Express (Applied Biosystems) software, and their spec- 
ificity was tested against the Silva database, NCBI, and with the alignments created from the sequencing results. Cross-reactivity 
between the primers/probes of the different groups were tested, e.g., the 18S group 1 was tested on group 2 standards, and no 
amplification occurred. The details of the primers and probes are given in Table S4. 


QUANTIFICATION AND STATISTICAL ANALYSIS 
The statistical analysis was conducted with R v.4.2.0°° in RStudio.®' Within the qPCR results, the abundance of Meringosphaera 


group 2 was analysed by ANOVA with month as a factor (N = 57). The data was tested for normality by creating Q-Q and residual 
vs fitted values plots in R. When error bars are shown in a figure, the type of error used is noted in the legend. 
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Figure S1. Diversity and phylogeny of psbA and rbcL from the Meringosphaera plastid 
sequences, related to Figure 1. A) Maximum likelihood tree of selected cultured and 
representative psbA sequences showing the Meringosphaera plastids sister to the 
Dictyochophyceae and split into two distinct groups that correspond to the groups in the 18S 
and 16S rDNA phylogenies (Figure 1 and Figshare repository D2.1A). The tree was 
reconstructed with a GTR+F+G4 model, chosen by ModelFinder. B) Maximum likelihood tree 
of selected cultured and representative rbcL sequences showing the Meringosphaera plastid 
sequences among the Dictyochophyceae and split into two distinct groups that correspond 
to the groups in the 18S and 16S rDNA phylogenies (Figure 1 and Figshare repository D2.1A). 
The tree was reconstructed with a GTR+F+I+G4 model, chosen by ModelFinder. For both 
panels, Pelagophyceae sequences were used as the outgroup, and support values 
correspond to ultrafast bootstrap values from 1000 replicates. Support values over 50% are 
shown on the tree, with values over 90% represented by a black circle on the branch. See 
Methods for details regarding the tree formation. 


10° a re 

10” : == > 

10° i © o ad ii $ e 3 

6 Lå 

Tm ie es 
D 
Qa io) Target 
S 10° > [ Host gr2 
8 104 4 3 ® Plastid gr2 
2 egn @ Host gr1 
2 10° 8 2  @ Plastid gr1 
“© 1X X © ROK OK KK HO MD —‘¢ X Not sampled 
D 
3 

10° @ 

4 AQ 
10 $ & 
10° e 6: 
e © 
1K —_— —_ —) "4 x 


O o N N N A N N N N N N L L 
DWT SAU AU SAU N T JA A JR T T N 
OC Ww Z yo po Zz AX) 4 KV Pa ys 4 pwd “ce 0o% Wo “oe 2 NAN < eer 


Time 


Figure S2. Seasonal dynamics of Meringosphaera and its plastid per sampling location, 
related to Figure 2 and STAR methods. This figure shows the same qPCR abundance data as 
that in Figure 2A, but here the data has been separated according to sampling location. The 
upper panel shows the samples from A17, the middle from Anholt East, and the lowest 
Slagg6. The abundance is reported as the Log of the gene copies Lt and the data are 
represented as the mean + SEM. For the host, this refers to the abundance of the 
Meringosphaera group 1 and 2 18S rDNA and for the plastid it refers to the group 1 and 
group 2 rbcL gene. The colour of the point indicates both whether it refers to host or plastid, 
and the group identity. A cross is shown to indicate when a location was not sampled. The 
samples were taken monthly between March 2021 and February 2022, in addition two extra 
samples were taken in October and November 2020. If the target was detected in some but 
not all of the technical replicates the sample was marked as detected but not quantifiable 
and were plotted with a value of 1.1 
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Station 4 A17 -® Anholt East -®- Släggö 


Figure S3. Hydrographic conditions at the sampling locations in 2021, related to Figure 2 
and STAR methods. This data was downloaded from the SMHI website 
(https://sharkweb.smhi.se/hamta-data) to match the locations of the samplings, and to 
show the hydrographic conditions over the course of a year. The total phosphorus measures 
all the phosphorus, both soluble and particulate, and the phosphate measures the dissolved 
inorganic phosphate. The colours represent the three sampling stations: Å17, Anholt East 
and Släggö. Data are represented as the mean + SEM. 
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Figure S4. Full comparison of the gene repertoires of the Meringosphaera plastid genomes 
with free-living Dictyochophyceae, related to Figure 4. Compares the presence/absence of 
annotated genes between the six complete SAG plastids and four free-living photosynthetic 
Dictyochophyceae. This figure shows the comparison of all annotated genes, while Figure 4A 
shows only a subset. The background colour of the squares highlights the identity of the 
plastids (pale green = group 1 SAGs, dark green = group 2 SAGs and pale blue = 
representative Dictyochophyceae). 
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Figure S5. Alignment of the N-terminal extension from a candidate host-encoded plastid-associated protein with predicted plastid-targeting, related to 


Figure 5. The hemB candidate from COSAG2b (Node 2647) is at the top of the alignment, indicated by the black circle. The first ~150 amino acids are aligned with 
representatives of different groups: red algae and secondary red plastid-bearing groups (highlighted in red), green algae (highlighted in green), members of the 


Alveolates (in orange), bacteria (blue), and eukaryotic homologs that are not associated with plastids (yellow). This last group includes node 2357 also from 


COSAG2b, but this copy is of centrohelid-origin and is predicted to be cytosolic. The alignment was performed by MaffT as described in the Methods section and is 


visualised here in AliView*?. 


S1 


S2 


S3 


S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 
Sampling | Anholt | Anholt | Anholt | Anholt | Anholt | Anholt | Anholt | A17 Å17 Å17 Å17 Al7 Al7 Al7 Al7 
location 
Sampling | 17 Oct | 17 Oct | 17 Oct | 17 Oct | 17 Oct | 17 Oct | 17 Oct | 11 Nov | 11 Nov | 11 Nov | 11 Nov | 11 Nov | 11 Nov | 11 Nov | 11 Nov 
date 2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 
18S Gr.2 Gr.1 Gr.1 Gr.2 Gr.2 Gr.1 Gr.2 Gr.2 Gr.2 Gr.2 Gr.2 Gr.2 Gr.2 Gr.2 Gr.2 
group 
16S Gr.2 Gr.1 Gr.1 Gr.2 Gr.1 Gr.2 Gr.2 Gr.2 
group 
rbcL Gr.2 Gr.1 Gr.1 Gr.2 Gr.1 Gr.2 Gr.2 Gr.2 Gr.2 Gr.2 Gr.2 
group 
psbA Gr.2 Gr.1 Gr.1 Gr.2 Gr.1 Gr.2 Gr.2 Gr.2 Gr.2 
group 
COSAG 2b 1 1 2b 2b 1 2b 2b 2b 2a 2b 2b 2a 2b 2b 
Plastid No Yes Yes Yes Yes No Yes No No Yes No 
complete? 
No. of 1 1 1 1 1 3 1 4 9 1 3 
plastid 
contigs 
Total size | 61,040 | 83,372 | 83,264 | 92,424 83,341 3,009 | 88,018 32,534 | 72,676 88,018 | 24,491 
plastid 
contig(s) 
(bp) 
No. of 137 137 141 137 139 139 
plastid 
ORFS 


Table $1: Summary and overview of sample collection and analytical details for the 15 SAGs presented in this work, related to STAR Methods. 
Here shown are: location and date of the sample collection, the group identity of the host 18S rDNA, and the plastid 16S rDNA, rbcL and psbA 
sequences. If no sequence match was found the square is left blank. In SAGs where plastid sequences could be identified, the number of contigs 


and total size of the plastid sequences, as well as whether the plastid genome is predicted to be complete is included. For the SAGs with a 


complete plastid genome, the predicted number of ORFs is provided. 


Assembly Accession GC N50 Assembly | Number | Fraction BUSCO BUSCO 
number content size (Mb) of of reads | score incl. | score excl. 

(%) contigs | that map plastid plastid 
COSAG 1 - 42.21 16291 120.51 25479 99.95% - 41.2% 
COSAG 2a - 43.43 9579 109.91 26840 99.92% - 36.0% 
COSAG 2b - 42.96 14044 422.44 91663 99.96% - 72.1% 
S1 SAMN32532880 | 44.56 13555 43.66 8894 99.50% 24.3% 23.6% 
S2 SAMN32532881 | 44.69 22578 71.15 11871 99.70% 36.5% 36.5% 
S3 SAMN32532882 | 41.86 21157 45.86 7371 99.67% 31.0% 30.6% 
S4 SAMN32532883 | 45.09 13583 44.70 9505 99.45% 17.6% 17.6% 
S5 SAMN32532884 | 42.71 34128 69.53 8767 99.83% 16.4% 16.4% 
S6 SAMN32532885 | 39.03 24121 39.01 6421 99.83% 13.4% 13.4% 
S7 SAMN32532886 44.9 23158 78.77 12828 99.74% 56.1% 55.7% 
S8 SAMN32532887 40.8 17863 56.78 10523 99.81% 18.9% 18.5% 
S9 SAMN32532888 | 39.07 17727 57.64 10396 99.78% 17.2% 17.2% 
S10 SAMN32532889 | 42.02 11131 57.00 13094 99.54% 24.7% 24.3% 
S11 SAMN32532890 | 41.45 21939 62.37 9512 99.77% 18.9% 18.9% 
S12 SAMN32532891 | 45.07 14565 61.65 12541 99.69% 21.2% 21.2% 
S13 SAMN32532892 | 45.51 8555 67.30 16956 99.42% 26.7% 26.3% 
S14 SAMN32532893 | 40.44 16806 55.15 9881 99.79% 18.1% 17.7% 
S15 SAMN32532894 | 40.85 17482 44.25 7951 99.77% 14.5% 14.5% 


Table $2: Assembly parameters for both the SAGs and COSAGs, related to STAR Methods. 
The accession numbers are shown for the raw reads that were deposited to the BioSample 
database under BioProject ID PRJNA917255. The average GC content, N50, number of contigs, 
and assembly size were calculated with Quast**. Samtools was used to calculate the fraction 
of the reads that mapped to the assemblies. The BUSCO completeness score was calculated 


with the eukaryotic database in genome mode. For the SAGs, BUSCO scores have been 


calculated both including and excluding the plastid contigs in order to facilitate direct 
comparisons with the COSAGS, which do not contain plastid contigs. S7 is believed to have 
higher completeness owing to the presence of contamination (as discussed in the methods). 


The co-assemblies have higher completeness than the majority of the individual SAGs. 


Name 


Target 


Reference 


Component 


Sequence 


Mer18S_831P 
Mer18S_860F 
Mer18S_801R 


18S group I 


This study 


Probe (FAM-TAMRA) 
Forward primer 
Reverse primer 


CATGGAATACAAATGTCCCCA 
GTCTTCCATGAATCCAAGAATTTCA 
CGGACCGACGTAATGATTAATAGG 


Mer18S_622P 
Mer18S_602F 
Mer18S_668R 


18S group 2 


This study 


Probe (FAM-TAMRA) 
Forward primer 
Reverse primer 


TCATGTGTACGCGAGGTG 
TGAGCGTCCGTGGCTACTG 
CGCACGCACTTAGTTAAAAGCA 


Mer_rbcL_1332P rbcL group 1 54°c This study 
qPCR  Mer_rbcL_1306F z 
Mer_rbcL_1353R 


Probe (VIC-TAMRA) |= TCCTGCAATCCTTC 
Forward primer GAAGGTCGTGATTACGTAGCAGAA 
Reverse primer TGAAGAGGACCACACATCTTAGCA 


Mer_rbcL_1079P rbcL group 2 54°c This study 
qPCR  Mer_rbcL_1049F - 
Mer_rbcL_1095R 


Probe (VIC-TAMRA) | CCTGCCTCAAGGTT 
CACATTACTAGAAACGCAAACTTCAAT 
AGCCCAGTCTTGTGCAAAGAA 


Forward primer 
Reverse primer 


CARD- EUK1209R General 18S 35°C or Giovannoniet Probe 

FISH (EUK1195) rRNA 30-40% 46°C al., 1988.3 Cee rere 
CARD- Meringosphaera This study 

FISH Mer482 18S rRNA 40% 35°C Probe GAATOGGIEETERATSAT 


Table S4: Oligonucleotides used in the qPCR and CARD-FISH assays, related to STAR Methods. The method column denotes if used for qPCR 
or CARD-FISH. For qPCR, listed are the TaqMAN primers and probes that target both the Meringosphaera 18S rDNA group 1 and 2, and the 
corresponding plastid rbcL group 1 and 2. For CARD-FISH, listed are the general eukaryote probe used for the positive control and the probe 
designed for Meringosphaera 18S rRNA. The temperature column shows for the qPCR target sets the Tm and for the CARD-FISH probes the 
temperature of hybridisation. The optimised formamide concentration in the hybridisation buffer (HB) only applies to the CARD-FISH probes. 
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